Annotation of Error Types for German Newsgroup Corpus

نویسندگان

  • Markus Becker
  • Andrew Bredenkamp
  • Berthold Crysmann
  • Judith Klein
چکیده

This paper discusses the corpus annotation effort in the FLAG project and its application in the development of controlled language and grammar checking applications. A USENET corpus was collected and annotated using the error typology developed in the project. The DiET tool was used to support the automatic annotation effort, and to evaluate and validate the data. Finally, we report on some interesting aspects of the data which came out of our evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tense, Modality and Polarity: The Finite Verbal Group in English and German Newsgroup Texts

This paper describes work in progress on a corpus-based study, comparing seemingly similar registers in two languages: English and German newsgroup texts, collected in the Bremen Translation Corpus. Systemic Functional Grammar (SFG, Halliday 1994 [1985]) provides a theoretical framework for categorizing empirical findings. I will focus on three systems of the finite verbal group, i.e. tense, mo...

متن کامل

EAGLE: an Error-Annotated Corpus of Beginning Learner German

This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic propertie...

متن کامل

Annotating Discourse Anaphora

In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.

متن کامل

Towards Detecting Annotation Errors in Spoken Language Corpora

The issue Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in part-of-speech and other positional annotation (van Halteren, 2000; Eskin, 2000; Dickinson and Meurers, 2003a), only recently has there been some work in detecting errors in synt...

متن کامل

Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data

We present a VerbNet-based annotation scheme for semantic roles which we explore in an annotation study on German language data that combines word sense and semantic role annotation. We reannotate a substantial portion of the SALSA corpus with GermaNet senses and a revised scheme of VerbNet roles. We provide a detailed evaluation of the interaction between sense and role annotation. The resulti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002